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We explore the mathematical and numerical aspects of reconstructing a potential 
energy profile of a molecular bond from its rupture time distribution. While reliable 
reconstruction of gross attributes, such as the height and the width of an energy 
barrier, can be easily extracted from a single first passage time (FPT) distribu- 
tion, the reconstruction of finer structure is ill-conditioned. More careful analysis 
shows the existence of optimal bond potential amplitudes (represented by an ef- 
fective Peclet number) and initial bond configurations that yield the most efficient 
numerical reconstruction of simple potentials. Furthermore, we show that recon- 
struction of more complex potentials containing multiple minima can be achieved 
by simultaneously using two or more measured FPT distributions, obtained under 
different physical conditions. For example, by changing the effective potential en- 
ergy surface by known amounts, additional measured FPT distributions improve 
the reconstruction. We demonstrate the possibility of reconstructing potentials with 
multiple minima, motivate heuristic rules-of-thumb for optimizing the reconstruc- 
tion, and discuss further applications and extensions. 
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1. Introduction 

In many applications, one wishes to infer properties of a material or a process in an 
interior region of a sample not readily accessible to experimental probes. Examples 
of such inverse problems involving boundary data include radiological imaging, 
where radiation passing through tissues is detected outside the sample, electrical 
impedance tomography, where potentials are measured on the exterior of a body, 
and seismology, where reflected waves are measured at the earth's surface. Such 
problems are often ill-conditioned: there may be several different interior structures 
that yield nearly the same measured boundary data. 

One type of "boundary" data that often arises in stochastic models is a first 
passage time distribution (FPTD), describing the probability of a random variable 
first reaching a particular value within a certain time window. Here, the bound- 
ary data is the probability flux out of the domain. Figure QJa) shows individual 
trajectories of a one-dimensional stochastic process and their corresponding first 
passage times. The FPTD is shown in figure D^b) along with its Laplace transform 
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in the inset. These types of first passage problems arise in many biophysical con- 
texts. For example, the voltage across a nerve cell membrane fluctuates due to noisy 
inputs from other neurons, and can be described by a biased random walk deter- 
mined by a constitutive voltage-current relationship intrinsic to the cell (Tuckwell 
et al. 2003) When the fluctuating voltage exceeds a threshold, the potential rapidly 
spikes before resetting. The interspike times define the first passage times of the 
fluctuating voltage from which one might wish to reconstruct the neuron's inherent 
current-voltage relationship. 

Stochastic inverse problems are typically ill-posed: there may be several different 
interior structures that could yield identical or nearly identical measured bound- 
ary data. Nonetheless, for many physical systems, reconstruction of constitutive 
relations from measured data can be cast in Sturm-Liouville form with an un- 
known spatially dependent coefficient (Levitan 1987, McLaughlin 1986) Given the 
eigenvalues of the problem and assuming a symmetric coefficient function, its full 
reconstruction is unique (Borg 1946). However, one eigenvalue spectrum is insuffi- 
cient to determine a general nonsymmetric coefficient (Borg 1946). For stochastic 
problems, the spectrum of the corresponding Sturm-Liouville problem cannot be 
readily extracted from data and algorithms developed specifically for reconstruc- 
tion through the eigenvalues (Brown et al. 2003, Rundell & Sacks 1992a, Rundell 
& Sacks 19926) are of limited use in our stochastic problem. This motivates the de- 
velopment of new algorithms and techniques that deal directly with the boundary 
data. 




time t time t 

Figure 1. (a) Three simulated realizations of a representative random walk and their 
first passage times ti. The random variable y(t) could represent the transmembrane volt- 
age of a neuron or the bond coordinate of an unfolding macromolecule. (b) Histogram 
of the first passage times of a stochastic process started at position x — y(0) = 0, 
W(x — 0,t), obtained from 2000 realizations of the process shown in (a). The inset shows 
W(s) = J °° W(t)e~ st dt, which is used extensively in this paper. An arbitrary potential 
was used to generate the data. 

In this paper, we investigate a stochastic inverse problem in the context of an- 
other system commonly encountered in biophysics: macromolecule unfolding and 
molecular adhesion bond rupturing. Macromolecular bond displacements are often 
described by a single bond coordinate, represented by a fluctuating Brownian "par- 
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tide" in a one-dimensional energy landscape. The metastable bond is considered to 
be broken the instant the bond coordinate reaches a critical extension. This prob- 
lem is of great interest in single-molecule biophysics, particularly in the context of 
dynamic force spectroscopy (DFS) (Evans et al. 1995). In DFS, a pulling force pro- 
tocol is applied to the bond and the force at the instant of rupture is recorded. The 
mean rupture force by itself would give very little information about the molecu- 
lar potential (Schlierf & Rief 2006) since many different potentials would yield the 
same mean rupture force. How much of the bond potential can be recovered from 
the measured rupture force distribution? All of the recent theoretical treatments of 
this problem have cither analysed the forward problem (Heymann & Griibmullcr 
2000), used physical approximations to derive simple force-dependent and time- 
dependent dissociation rates (Bell 1978, Walton et al. 2008), and/or considered 
simple 2-3 parameter single minimum potentials (Hummer & Szabo 2003, Dudko et 
al. 2008, Freund 2009). The rate of force increase as a function of displacement (the 
rupture stiffness) has also been incorporated into a procedure to fit basic parame- 
ters of simple potentials (Fuhrmann et al. 2008). However, by imposing such simple 
two or three parameter forms for the reconstructed potential, one loses details such 
as multiple minima. 

Here, we approach the inverse problem by allowing a wider class of potentials, 
including those with multiple minima. Within a class of potentials, we numerically 
determine the ones that best fit the entire measured FPTD. Although the difficulty 
of extracting eigenvalues from FPTD data is avoided, the inverse problem remains 
intrinsically ill-conditioned and it is not surprising that almost all studies have 
focused on reconstructing only two or three attributes of the stochastic process, 
typically, the energy barrier height and width. In §2, we formulate the problem 
through the backward equation of a Brownian process with a potential energy- 
derived drift. In §3, we decompose the drift function into basis functions and develop 
an iterative optimization procedure to find the coefficients of these basis functions. 
In §4, we show that using a single FPTD restricts the type of potentials we can 
reconstruct. We also show how the inverse problem can be optimized by tuning the 
amplitude of the unknown potential and the initial bond displacement. Another key 
finding is that multiple FPTDs greatly facilitate the reconstruction, allowing us to 
accurately determine potentials with multiple minima. We propose experimental 
protocols that can be used to generate these additional FPTDs. Finally in §5, we 
discuss limitations of our method and possible refinements. 

2. Stochastic Theory and the Inverse Problem 

The general problem of stochastic bond rupturing is geometrically complex, particu- 
larly when considering deformations associated with large macromolecules carrying 
many degrees of freedom. Although in principle these systems can be modeled by 
stochastic processes in higher dimensions, for simplicity, and we restrict our math- 
ematical analysis to a one-dimensional Brownian motion described by a diffusivity 
D{x) and a convective drift — D(x)(kBT)~ 1 (d$>(x)/dx) proportional to the force 
derived from a time-independent molecular bond potential $(x), and to the mobil- 
ity D(x)(kBT)~ 1 . Although we restrict ourselves to time-independent potentials, 
corresponding to static forces, the rupture force distribution can be transformed 
into a first rupture time distribution (FPTD) in the quasi-adiabatic limit (Dudko 
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Figure 2. (a) The solution to the forward equation l2,2l for the first passage time distribution 
(FPTD) as a function of starting position x with D = 1 , L = 1 and U (x) —4 — Wx + 6x 2 . 
The delta function at t = when x = 1 was approximated by a sufficiently narrow 
gaussian centered at t = 0. (b) For each starting position x, slices through the surface 
define the FPTD w(x,t). Two slices corresponding to the starting positions x — (solid) 
and x = 0.45 (dashed) are shown. Inset: The Laplace transform of the same two slices. 



et al. 2008). The continuous Brownian process can be described by the probability 
P(y,t\x)dy for the bond coordinate to be between positions y and y + Ay at time 
t, given that it started at position x at initial time t — 0. This probability density 
obeys the backward equation (Gardiner 2004) 

dP(y,t\x) D(x) d$ f dP(y,t\x) \ , d 2 P(y,t\x) 

-^r~ + fc^r & {—dx— ) = D{x) dx- ■ (2 - 1} 

Since the bond is irreversibly ruptured when stretched past a known position y = L, 
we impose the absorbing boundary condition P(y — L,t\x) = 0. The bond survival 
probability at time t, given that it started initially at position x is found from 
integrating the probability density over all the final coordinates that an unruptured 
bond can take, i.e., S(x,t) — J Q P(y,t\x)dy. From S(x,t), we define the FPTD 
w{x,t) = —dtS(x,t), which obeys 

dw(x,t) D(x) rfgfr) dw(x,t) _ ^ d 2 w{x,t) 

dt + k B T dx dx ~ (X) 8x 2 ' ( ' 

subject to initial condition w(x, 0) = 0, and boundary conditions d x w(x, t)\ x —o = 
and w(x = L,t) = S(t). 

In the forward problem, $(x) and D(x) are given and one solves equation (|2.2p 
to find the function w(x,t), as shown in figure^ Each fixed- a; slice of the surface 
w(x,t) represents the FPTD for a particle that started the random walk at position 
x. 

In the inverse problem, the functions Q(x) and D{x) are unknown and need to 
be determined from an experimentally measured or simulated FPTD and a known 
starting position x. In general, the unique pointwise reconstruction of both D{x) 
and $(x) from FPT data is impossible (Bal & Chou 2003). At best, only half of 
either D(x) or $(x) can be uniquely determined from a single FPTD (Bal & Chou 
2003, Chen et al. 1985). It has been shown that if we assume D is a known constant, 
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then <5>(x) is uniquely identifiable from a single FPTD provided it is already known 
over a certain interval within (0, 1] (Bal & Chou 2003). 

3. Reconstruction Algorithm 

Since in this problem, is not known on any requisite interval, it is not even clear 
whether can be uniquely reconstructed. Nevertheless in this section we express 
<f>(x) as a superposition of (smooth) basis functions and attempt to reconstruct the 
coefficients, deferring a rigorous analysis of our algorithm for a future publication. 
We find that representing $(a;) using a relatively small number of basis functions 
renders the problem computationally tractable, yielding a unique solution in many 
cases. 

Henceforth, we nondimensionalize the problem by measuring distance in units of 
L, time in units of L 2 / D, and the potential $(x) in units of the thermal energy fc^T. 
Finally, to avoid numerically representing the 5— function in the boundary condition 
w(l,t) = S(t), we work with the Laplace transform w(x, s) = J Q w(x,t)e~ st dt, 
which obeys the infinite set (for each s e M>o) of uncoupled ODEs 

^%^ + U{ X ) d -^ = s^s), (3.1) 

subject to the Laplace-transformed boundary conditions d x w(x, s)\ x =o — and 
w(l,s) = l. 

In equation (|3.ip . we have defined the dimensionless drift U(x) = — d$(x)/dx. 
The condition d x w(x, s)\ x= o = represents a reflecting boundary at x — and 
ensures a nonnegative bond coordinate. Equation p. II) is a differential equation 
in the initial bond position for the Laplace-transformed rupture time distribution 
w(x, s). 

Mathematically, our objective is to reconstruct U{x) and then extract, modulo 
an irrelevant constant, the unknown potential Q(x). However, to do so, we must 
choose a reduced representation of U (x) that renders, for a chosen value of x, and 
all s G M>o, w(x, s) as close as possible to the measured or simulated Laplace- 
transformed FPTD W(s). Since it is known that approximating a function using 
a basis of monomials leads to a very ill-conditioned problem (Keller 1975), we 
represent the convection in terms of orthonormal polynomials: 

n-l 

U(x) = J2 a Mx), (3.2) 

i=0 

where {a^} = a is the vector of expansion coefficients, and the first few orthonormal 
basis functions are 

u (x) = 1, U!(x) = V3(l - 2x), u 2 (x) = \/5(l - 6x + 6a; 2 ), . . . (3.3) 

Our method for reconstructing U(x) consists of using a spectral method (Trefethen 
2000) to repeatedly solve the forward problem equation (|3.1j) to refine our estimate 
for the potential^ 

f More general potentials and drift functions that diverge at x = lead to highly singular 
differential equations, but can still be solved using spectral methods (Trefethen 2000). 
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Starting with an initial guess for the drift (say, U(x) = 0, the "null" hypothesis), 
we solve equation (|3.ip for many positive values of s to obtain a numerical approx- 
imation for w(x, s; a). We then compute the "distance" between the w(x, s; a) and 
the given data W(s) using the objective functional 

/>oo 

11(a) = / \w(x, S ;R)-W(s)\ 2 g( S )d S , (3.4) 
Jo 

where g(s) is a function that weights FPT data differently for different s. By appro- 
priately adjusting U(x), implemented through small changes in a, 11(a) is decreased. 
The incremental adjustments in a are repeated until 11(a) is minimized. Although 
many different algorithms can be used to minimize 11(a), we first consider g(s) = 1 
and choose a safe-guarded Newton strategy that relies essentially on computing the 
Hessian of 11(a). Details of the algorithm are described in | Appendix A| 

4. Results and Discussion 

We test our algorithm and discuss reconstructing the drift function U(x) from 
(i) a single, perfectly measured distribution of rupturing times, and (ii) multi- 
ple perfectly measured distributions of rupturing times, realized under different 
experimental conditions. We first generate perfect "data" by solving the forward 
problem using a hypothetical target potential function $*(x) (and corresponding 
U*{x) — J27=o a t u i( x ))- After generating the data W(s) = w(x, s; a*), we pretend 
we did not know the coefficients a*, and try to reconstruct them by minimizing 11(a) 
through successive iterations k of the numerical algorithm detailed in | Appendix A| 
Starting from an initial guess for a(fc = 0) = 0, we investigate if and how a(fc) 
approaches a*, and the number of coefficients that can be reliably reconstructed. 
Using a single FPTD, we find that reconstruction of $>*(x) is badly conditioned for 
n > 4, but that using two or more distinct FPTDs allows us to easily find 5 or more 
coefficients of <f>*(x) in many cases. 

(a) Single measurement 

We first assume a target potential $*(x) = — ( 1+7 5 ^ )x + ^^-x 2 — VSx 3 corre- 
sponding to a target drift function U*(x) parametrized by (aj, al,al) = (|, v/^jj)- 
Figure[3][a) shows that starting with the initial guess U(x) = 0, minimizing the ob- 
jective function equation (|3.4p leads to accurate convergence to the unknown target 
drift U*(x) within ~ 10 iterations. We find in figure [3]Jb) that a five-parameter po- 
tential is typically a marginal case in that it can only be occasionally reconstructed, 
and only after a large number of iterations. However, we are typically not able to 
accurately reconstruct a potential with six parameters (see figure El^c)), regardless 
of the number of iterations. 

When the unknown target drift function U*(x) is structurally more complex, 
extremely slow or nonconvergence to a* arises because the curvature of n(a) near 

| If we had chosen to work in the time domain, a reasonable objective function that mea- 
sures the difference between measured and computed FPTDs would be 11(a) = J °° \w(x,t;a) — 
W(t)\ 2 g(t)dt. 
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Figure 3. Reconstruction of three, five and six parameter potentials. Row (a): 
reconstruction of a three parameter potential corresponding to drift coefficients 
(clq, 0,1,0,2) = (1/5, 9/10, y/3/20). Row (b): attempted reconstruction of a five 
parameter double-well potential with coefficients a* = (1/5, 3/5, 2/5, 3/5, v2/5). 
Row (c): failed reconstruction of a six-parameter potential defined by 
a* = (1/5, 1/2, 2/5, 1/2, 1/5, -y/13/50). In the second column, the coefficient values 
do, Oi, 02, «3, 04, 05 at each iteration are indicated by open circles, filled circles, open 
triangles, filled triangles, open squares, and filled squares, respectively. 



the true minimum, in at least one direction, becomes extremely small. Since our 
minimization algorithm relies on essentially inverting the n x n Hessian matrix 
Hij = <9 Qi 9 aj n| a=a * (see I Appendix A| , the mathematical feasibility of the recon- 
struction is limited by its condition number k = A max /A m j n . Here, A max and A m i n 
are the largest and smallest eigenvalues of H, representing the largest and smallest 
curvatures of n(a) at a*, along the corresponding eigendirections, respectively. In 
addition to increasing the number of eigendirections, increasing n rapidly decreases, 
in particular, the minimum curvature A m i n , thereby increasing k = A max /A m ; n and 
making the minimum in 11(a) harder to find. As shown in | Appendix B[ larger val- 
ues of i and j correspond to more rapidly oscillating basis functions that reduce 
the magnitude of Hij. This property renders the problem badly conditioned and 
is the underlying mathematical reason for the difficulty of extracting more than 
three parameters from a given potential landscape. To explicitly illustrate the ill- 
conditioning of the problem, we plot in figure EJa) the three-parameter objective 
function n(ao,ai,o,2) (with g(s) = 1) as a function of the parameters ao and a\. 
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Figure 4. (a) Objective function II(ao, 01, a%) = J °° \w(x — 0.3, s; ao, ai, a^) — u>(0.3, s; a*)| 2 ds 
for the potential shown in Fig 2(a) as function of ao and a±. Projected onto ao — ai 
space, IT exhibits a much smaller curvature in one direction compared to the orthogonal 
direction, (b) The minimum and maximum curvatures of II (a), and the inverse condition 
number. For constant weighting g(s) defined in equation (|3.4[) . we find that the condition 
number k decreases monotonically as the amplitude of the target potential A* — > 0. 



Although the global minimum in 11(a) occurs at (ao,a*), it is clear that the curva- 
ture near the minimum is extremely small along at least one direction, making the 
minimum difficult to find numerically. 

Since all three target potentials considered in figure[3]were chosen to have |a* | 2 = 
1, figure [3] fairly compares the reconstruction of different-shaped potential functions 
with equal amplitude While increasing the dimension n makes the problem more 
ill-conditioned, for fixed n, reconstruction efficiency may nonetheless depend on the 
typical magnitude of the potential to be reconstructed. To compare reconstructions 
of potentials of different expected magnitudes, we define the amplitude factor 



n-l 
i=0 

for each target drift function. While the amplitude A* needs to be found from 
reconstructing the values of a* , its value sets the scale of the unknown drift function 
relative to thermal diffusion and defines an effective Peclet number for this problem. 
Experimentally, the shapes of potentials are fixed by molecular details; however, the 
Peclet number A* is inversely proportional to temperature and can in principle be 
tuned experimentally. 

Figure SJb) shows that for a fixed-shape target drift function U*(x) of the form 
a* = A* x (1/5,9/10, y/3/20), and a single a FPTD measurement, the inverse of 
the condition number is maximized in the limit A* — > 0. Therefore, the problem 
has the best conditioning and the most efficient mathematical reconstruction in the 
zero Peclet number limit, when the potentials are weak. Computationally, a single 
FPTD data set arising from a vanishingly small drift perturbing the purely diffu- 
sive problem gives the most numerical "signal" for reconstructing the coefficients 
of U*(x). Although the magnitudes of aj are vanishingly small, their incremental 
effect on reducing the condition number k is nonetheless greatest in this limit. This 
optimal limit arises from a mathematical analysis and is not predicted by phys- 
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Figure 5. Inverse condition number as a function of starting position x and drift am- 
plitude (Peclet number) A* for three different potentials, (a) $*(:r) = — A* ^x 2 (b) 

$*(x) = A* [-^x+ 2^1 a; 2 ] (c) $*(x) = A* + ^fix 2 - ^3x 3 ] . The ratio 

is typically maximal for A* — > and a; ~ 0.75. 

ical considerations. However, system and experimental constraints may preclude 
measurement of effective potentials at extremely low Peclet numbers A* (high tem- 
peratures), suggesting that an optimal, intermediate temperature may still arise in 
practice. 

In figure [SJ the behavior of t\> clS cL function of starting position is even more 
intriguing. For constant g{s) and all potentials we tested, the optimal value of the 
starting position occurs roughly near x = 0.7 — 0.9. This starting position is close 
to the rupture point at x = 1 and is somewhat insensitive to the amplitude A*, 
except for very large A*. The robustness of this optimal starting position arises 
from analysing the Hessian matrix, in particular the dependence of its condition 
number on x. 

From the form of (see equation (|B 2|1 ). we can show numerically that A m i n 
has a maximum for x ~ 0.7— 0.9 and that the behavior of k _1 is rather insensitive 
to changes in A max ; thus, k" 1 is typically maximal near x = 0.75. For the three 
qualitatively different potentials used in figure [SJ the optimal starting positions all 
fall approximately within x = 0.7 — 0.9 for a wide range of amplitudes A* . In these 
examples, the best conditioning occurs in the limit A* — > 0, consistent with figure 
[4f b) . For nonconstant g(s), the approximate optimal starting position x typically 
ranges from 0.5 to 0.9, depending on the form of g(s) (cf. figure [7] in |Appendix B[ ). 

(b) Multiple measurements 

Since a single measurement W(s) is typically insufficient to reconstruct the 
potential well beyond three coefficients, even after optimizations with respect to 
Peclet number A* and starting position x, we consider how additional data can be 
used to refine the reconstruction of U*(x). As indirectly suggested by the analysis 
of varying A* and x, an unknown potential can be changed by a specified amount to 
yield a FPTD different from that of the original unchanged potential. By imposing 
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any number of perturbations, multiple FPTD data W can be measured and used 
to aid the reconstruction of the original potential. 

We propose three protocols for modifying the potential to be reconstructed. 
Experimentally, these protocols correspond to changing the system temperature, 
applying then quickly removing a force to change the starting position, and adding 
an applied force at the start of the stochastic process. Mathematically, these pertur- 
bations correspond to specific changes in A* , x, and the form of the potential $*, 
respectively. The multiple FPT distributions, measured under different conditions, 
can then be combined into a multi-distribution objective function. We summarize 
the protocols below: 

• Changing amplitude via temperature - One way to obtain additional data 
is by changing the amplitude (or effective Peclet number) A* of the unknown drift. 
For each distinct value of A* , a separate FPTD w(x, s; a*) can be measured. These 
different FPTD's all arise from bond potentials with the same underlying shape, 
and can be used together to better reconstruct $*(x). While the absolute value of 
A* needs to be determined from the reconstruction of a* , the relative temperature 
at which a second measurement is taken can be used to determine the ratio 9\ = 
At /A* v 

• Tuning starting positions - By adding a force to the system before the start of 
the process, one can adjust the initial position x of the bond. At t — 0, this force is 
released, and the stochastic process proceeds under the original target drift U*(x), 
provided the potential relaxes quickly to Stochastic bond dynamics starting 
at different positions x yield different measured FPT distributions. 

• Adding Probe Forces - Finally, one can add known potentials to the original 
target potential immediately after the start of the stochastic process to obtain 
additional FPT distribution data. Here, — > <£*(x) + A$(x), where A$(x) is 
known. The associated drift then changes according to U*(x) — > U*(x) + AU(x), 
where AU(x) is implemented through a known change in the expansion coefficients 
Aa and represents an externally applied force imposed by e.g., a pulling device such 
as an AFM tip or an optical tweezer. The associated external potential in such cases 
may be of the form A$(x) = — F cxt x — Kx 2 /2, where F ext is the externally applied 
time-independent force and K is the elastic response of the pulling device. In this 
case, the new total bond potential $* + AQ(x) induces a drift U*(x) + AU defined 
by a* + Aa where Aa = F oxt + K/2 and Aa x = -K/(2t/$). This new drift gives 
rise to another, different FPTD. 

An objective function that incorporates all M FPT distributions measured un- 
der M different conditions described above can be defined as 

±I M (a) - V / [w(x m , s; e* m ,a+ Aa m ) - W( S )} 2 g m (s)ds, (4.2) 

where x m , Aa m , and 0* n = A* m jA\ denote the known starting position, added 
pulling force, and relative temperature of the m th measurement, respectively. The 
function g m (s) weights each of the m measurements differently. If the data W in 
equation (I4.2j) were generated from a target drift U*(x), as is the case in our analy- 
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Figure 6. Attempted reconstruction of 5-parameter potential wells (solid black) us- 
ing single and double data sets, (a) Blue dotted and red dashed lines: reconstructions 
from single data sets generated using A* = 6.11755... and = A*/5 respectively. 
Circles: reconstruction using both data sets. Original potential was parametrized by 
a* = (-1/10, ll\/3/5,89/(14\/5),8/V7, 50/21), starting position was x = 0.170. (b) 
Blue dotted and red dashed lines: reconstructions from single data sets generated us- 
ing xi — 0.182 and X2 = 0.727 respectively. Circles: reconstruction using both data 
sets. Original potential was parametrized by a* = (2/3, 2, 5/3, 2, 4/3)/\/l3), starting 
position was x = 0.170. (c) Blue dotted and red dashed lines: reconstructions from 
single data sets generated without and with probe force AU = x. Circles: reconstruc- 
tion using both data sets AU = and AU = x. Target potential was parametrized by 
a* = (2/3, 2, 5/3, 2, 4/3)/ V / 13), starting position was x = 0.642. 



ses, then it is denned as W(s) = w(x m , s; 9^, a* + Aa m ). Measured data W should 
be obtained with different starting position x m , applied force Aa, and/or different 
temperature ratios 9* n with at least one of the known parameters x m , Aa m , A* n 
different among the M measurements. Multiple data sets provide additional con- 
straints, increasing the curvature of the objective function near a*. In general, the 
smallest eigenvalue A m in of the Hessian matrix associated with IIm increases with 
M. Upon minimizing the multi-FPTD objective function ILm, we obtain a*. 

To illustrate how additional data can improve potential reconstruction, we com- 
pare how including two FPT distributions (M = 2) in our objective function IT 
simultaneously affects reconstruction relative to using each FPTD separately. The 
two distributions will arise from two ideal measurements taken under two different 
conditions within each of the proposed experimental protocols described above. In 
Figure [SJa) , we attempt to reconstruct the five-parameter, double- well potential 
$*(x) = -28x + ^-x 2 - 307x 3 + 290x 4 - 100a; 5 using two "temperatures." This 
potential corresponds to A\ = |a*| = 6.11755 We then generated data asso- 
ciated with $*(x)/5, corresponding to A* 2 — A*/5 — 1.22351.... Reconstruction 
of the original using each individual FPTD fails, as does using both FPTD 
data sets. In figure |UJb), we see that while using data sets corresponding to either 
initial position x% — 0.182 or x-i = 0.727, the algorithm fails to reconstruct the 
target potential. However, using both initial positions together allows us to accu- 
rately determine $*(a;). Similarly, adding the perturbing potential A$2 = —x 2 /2 
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(AC/2 = x) provides another FPTD that allows accurate reconstruction of a double- 
well potential (figure HJc)). 

5. Summary and Conclusions 

We have analysed the mathematical aspects of reconstructing the drift of a stochas- 
tic process from perfectly measured first passage time distributions. In practice, 
insufficient number of bond rupture events are currently measured to enable quan- 
titative potential reconstruction; therefore, we used numerically generated data to 
illustrate our main mathematical results. For single distributions, only very coarse 
attributes (approximately three parameters) can be reconstructed. We demonstrate 
how to optimize the efficiency of the reconstruction by controlling the effective am- 
plitude or Peclet number A* and starting position x of the stochastic process. If 
only one FPTD can be measured, our analysis suggests that A* — > and x ~ 0.75 
are the most likely parameters to give the best chance for reconstructing relatively 
simple potentials. However, these findings were found numerically by assuming per- 
fect data, uniform diffusivity, precisely defined starting positions x, and a constant 
weighting function g(s). In practice, finite time resolution, noisy data, and other 
experimental limitations may be accounted for by a more suitable weighting func- 
tion g(s) (or g(t) if II were a functional of w(x, t) and the data W(x, t)). We show in 
|Appendix B| that the optimal parameters A* and x can change when a nonconstant 
weighting function g(s) is used. While the optimal values of A* — > and x ~ 0.75 
(for g(s) = 1) were found numerically, without physically realistic limitations, they 
nonetheless provide a possible experimental starting point. 

We also showed that additional measurements in the form of multiple FPTDs 
can be used to provide dramatically better conditioning of the problem, allowing 
finer details of the drift function to be extracted. The total objective function in- 
cluding the constraints from all M measurements has a sharper minimum, increas- 
ing the efficiency of standard optimization algorithms. We proposed three ways of 
obtaining additional measurements under different experimental conditions: tuning 
the effective amplitude or Peclet number A* through the system temperature, ad- 
justing the starting position x via an initially applied force, and adding a known 
"probe" force through a potential A$. This later potential can be realized in a 
number of ways, from directly mechanically pulling on the bond, to using muta- 
genesis to systematically change local properties of the bond energy profile. Such 
mutant bonds may provide additional FPTD data facilitating reconstruction of the 
original "wild-type" potential. 

A number of refinements are suggested by our analysis, and some are discussed 
in the Appendices. For example, rather than Laplace-transforming the data, one can 
directly fit to W{x, t). Although this approach is computationally more expensive, 
it would allow us to treat time-dependent potentials U{x,t), and directly analyze 
dynamic force spectroscopy experiments (Evans et al. 1995, Heymann & Grubmiiller 
2000, Fuhrmann et al. 2008), or scenarios in which the temperature is changed in 
a time-dependent way (Getfert & Reimann 2009). Moreover, specific to the bond 
rupture problem, data involving bond coordinates as a function of time, if accurately 
measured, can also be incorporated into the objective function. These additional 
data may be useful in combating the noise problem and help improve the overall 
conditioning. 
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Other more mathematical refinements and extensions can also be implemented, 
including exploring effects of using different basis functions for U(x), using more 
sophisticated optimization methods, quantifying the reconstruction efficiency for 
large M, defining the experimentally-imposed weighting function g(s), reconstruc- 
tion of the diffusivity D(x) itself, and systematically exploring the effects of noise 
in the data. Here, a Bayesian approach to estimate likelihood functions for the coef- 
ficients a* , or information criteria to choose the size n of the basis expansion might 
be useful (Getfert et al. 2009). 

The authors thank S. Getfert, A. Fuhrmann, and A. Landsman for helpful comments. This 
work was supported by NSF grant DMS-0349195 and NIH grant K25 AI058672 



Appendix A. Numerical Methods 

(a) Numerical scheme for solution of the Backward Equation 

In the forward problem, with D(x) and given, equation f|2.2[) can be solved 
using standard finite difference schemes. Figure (Hb) shows numerically computed 
FPTDs for two different starting positions with D = L = 1, U(x) = 4 — lOx + Qx 2 . 
The singular boundary condition w(x — l,t) — S(t) is treated by taking Laplace 
transforms in time of equation (|2.2j) and solving equation (|3.1j) for all values of the 
Laplace-transform variable s. Moreover, the numerical solution of equation (13.11) for 
a set of values s can be found much more quickly than solving the full partial differ- 
ential equation on a large x — t grid. For D — 1 and a given drift function U (x), we 
use a spectral method (Trefethen 2000) to solve the Laplace-transformed Backward 
Equation (|3.1[) . First, the spatial domain is mapped from [0,1] to [—1,1] using a 
change of variable. Then, the function w(x, s) is represented by w(xi, s) = ibi and 
interpolated between the TV Chebyshev points xi — cos (j^zrr^j with polynomials. 
The resulting N x N system of equations for wt are 

QijWj = Si ti , (Al) 

where the matrix Qij is 
Qij — Sij, 

Qij = 4(D^)y + 2(UD JV ) tf -«*i i , » = 2,...,JV-1, j = 1, 2, N 
Q Nj = (Djv)jvj) i = 1,2, ...,N, 

(A 2) 

where Dat is the usual N x N pseudospectral differentiation matrix (Trefethen 
2000) and the diagonal matrix U is defined by 

U tJ =U(x l )8 l] . (A3) 

We used N = 51 spectral points in all of our computations within the iterative algo- 
rithm. Fixed— x slices of the numerically obtained functions w(x, s) are qualitatively 
similar to the plots shown in the inset of figure [D^b) • 
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(b) Evaluation of the objective function 

In our analysis, we generate data by numerically computing distributions derived 
from a target drift function U* (x) (denned by its polynomial coefficients a* ) . Since 
the data is generated numerically, we use W(x, s) = w(x, s;a*) in the objective 
function and write 

/>oo 

n(a) = / [w(x,s;a) - w{x,s;a*)] 2 g(s)ds. (A 4) 

Jo 

Note that the i th moment of the FPTD is given by 

>,d e w(x, s) 



( T ) = (-ir 



(A 5) 



Therefore for two FPTDs with identical first few moments, their first few derivative 
are also identical. Such FPTDs can be distinguished only their difference at larger 
values of s (and the function g(s) in equation (| A 4[) can be used to weight these 
differences accordingly). Only information contained in the tails of the Laplace- 
transformed distributions can distinguish two FPT distributions with equal lower 
moments. Therefore for the algorithm described in |Appcndix A| to be effective, 
a numerical approximation to the integral in equation (|A4I) must evaluate the 
integrand for sufficiently large values of s. This can be done by mapping s G [0, oo] 
to £ 6 [0,1] through a change in variable s(£) = £/(l — £), and computing 

II(a)= j\w(s(0^)-u>(s(0;Bil} 2 9(s(0)-^^- (A6) 

In order to choose g(s) such that 11(a) remains convergent, we assume that g(s) 
has no singularities in s £ (0, oo) and consider the behavior of the integrand at the 
end points s — and s — oo through the asymptotic expansions 

w(s) = f™e- st w(t)dt 

ujW(0)n! ^ 1 (A 7) 



.-- : 

n=0 



and 



#(8) .f;f!^, .,«, (A8) 



n=0 



Since w(t = 0;a) = w(t = 0;a*) = for any two sets of drift coefficients a and 
a*, the asymptotic expansion (|A 7p implies that [w(s;a) — w(s;a*)] 2 — 0(s~ 4 ) as 
s — > oo. If the first k time derivatives of w(t; a) and w(t; a*) match, equation (|A 7\i 
implies that [w(s;a) — w(s;a*)] 2 = 0(s~ 2 ( k+2 )). For a weighting function of the 
form g(s) = s q , the integrand in equation ([A 6ft is 0((1 — £,) 2 ~ q ) as ^ — >■ 1 and is 
intcgrable at £ = 1 provided q < 3. 

Since w(s = 0;a) = w(s = 0;a*) = 1, the asymptotic expansion (|A 8| implies 
that [w(s;a) — w(s;a*)] 2 = 0(s 2 ) as s — >• 0. If the first k s-derivatives agree (i.e. 
the first k moments of w(t;a) and w(t;a*) are identical), equation (|A 8|) implies 
that [w(s;a) — w(s;a*)] 2 = 0(s 2 ( fe+1 '). If g(s) — s q then the integrand in equation 
~6|) is 0(^ 2+q ) as £ — > and is integrable at £ = provided q > — 3. 
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To summarize, if we take A(s) = s q , then convergence of the integral in equation 
(|A 61) requires —3 < q < 3. When evaluating 11(a), a fourth order open trapezoid 
rule (that does not require evaluation of the integrand at the end points) was 
used and typically 100-1000 uniformly spaced trapezia were found to give sufficient 
accuracy for the plots shown in Figs. [3] and |H 



(c) Minimization of 11(a) 

The linear system (|3.1[) must be solved for many different values of s so that 
it can be used in the discrete approximation to the objective function (|3.4|) . To 
minimize (|3 .4[) . we use a safe-guarded Newton strategy: 

a(fc + 1) = a(Jfe) - crG- 1 (a(fc))V a n(a(/c)). (A 9) 

Here, G is a positive definite matrix and a > is the step size chosen to minimize 
II(a(fc + 1)) along the descent direction G _1 V a IL 

To compute G, we adopt the following procedure. If the Hessian = <9 ai <9 ai n(a) 
is positive definite, we set G = H. If the Hessian H is not positive definite, we choose 
a small Tikhonov regularization (Vogel 2002) parameter a such that H + al is safely 
positive definite (I is the identity matrix) and set G = H 4- al. In either case, G _1 
is also positive definite and moving in the direction of — ctG -1 V a n is guaranteed to 
decrease n(a(fc + 1)) for sufficiently small a. The value of a is found by performing 
an exact line search to minimize H(&(k + 1)) along the descent direction. All Jaco- 
bian and Hessian matrices are approximated numerically using a suitably small da, 
typically on the order of 10 -5 . The algorithm terminates when the relative change 
in the objective function is less than 10~ 3 . 

We represent the drift function U(x) using orthonormal polynomials on [0, 1] 
given in equation f|3 . 3f) . Often, the potential and drift arising from molecular in- 
teractions diverge as x — > (such as in the Lennard- Jones potential). In this case, 
and U(x) could be represented using basis functions with the correct divergent 
behavior for x —> 0. Although the Laplace-transformed backward equation (13. ip has 
an irregular singular point in this case, the spectral method retains its ability to 
find solutions as long as x — 1 is removed from Chebyshev grid. 



Appendix B. Analysis of Eigenvalues and Condition 

Numbers 

Since the ease of minimizing n(a) is quantified by the condition number k = 
Amax/Amin of the Hessian H, we now consider the behavior of the minimum and 
maximum eigenvalues A m i n and A max . If k = 0(1), the problem is well-conditioned; 
if k 3> 1, the problem is badly conditioned. Clearly, the one parameter optimization 
problem has k = 1 and is well-conditioned. However, when the number of parame- 
ters increases, it is desirable to find conditions under which k _1 is maximized. 
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Figure 7. Inverse condition number kT 1 in the A* — > limit as a function of starting 
position x and the weighting function g(s) used in the objective function 11(a). In this 
limit all potentials (with a fixed number of parameters) have the same Hessian, (a) Two 
parameter potentials with weighting function g(s) = s q . (b) Three parameter potentials 
with weighting function g(s) — s q . (c) Two parameter potentials with weighting function 
g(s) — e~ ps . (d) Three parameter potentials with weighting function g(s) = e~ ps . 



(a) Small Peclet number analysis 

We now analyze k in the <C 1 limit. The components of the Hessian at a* 
(the drift coefficients of the target potential) are given by 



H% _ a 2 n(a) 

* daidaj 



= 2 



, dw dw 
dai <9a, 



ds. 



(Bl) 



For weak potentials, we find solutions to w" + U(x)w' = sw in the U(x) — > limit. 
If U{x) — 0(A*), we expand the solution in the form w(x, s) — J2m=o ^rnix, s) 
where w m = 0(A* m ), and use the boundary conditions u>o(l, s ) = 1, w' (0, s) = 
and w m (l,s) = w' m (0,s) = (m > 1) where primes denote differentiation with 
respect to x. The first two terms in the expansion are 



w (x, s) 
Wi(x,s) 



cosh < 



coshy 7 ! 



cosh ,/s 



G(x, x')U(x') sinh yfsx'&x' ', 



where the Green's function 



G{x,x') 



sinh y/s(l — x>) cosh v / sa;< 
\fs cosh \fs 
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satisfies G" — sG = — 5(x — x') and x < (x > ) is the lesser(greater) value of x, x'. 
Upon using the full orthonormal polynomial expansion U(x) = Y^iLa a i u i{ x ), we 
find explicitly 



H iJ l = \ ds [ [ G(x,y)G(x,y')u l (y)u j (y')smh(y/sy)smh( y /sy')dydy'. 

Jo cosh y/s Jo Jo 

(B2) 

Since all elements of H are independent of the drift coefficients otj, they are also 
independent of A*, and so are all eigenvalues. Therefore, as a function of A*, the 
inverse condition number 1/k = A m j n /A max approaches a constant as A* — » 0. 
Moreover, for all forms of g(s) tested, we find numerically that is maximal in 
the A* — > limit. These results confirm the numerical data in figureEfb) and figure 

El 

Within the A* — > limit, different weightings g(s) can also be used to better 
maximize k . Using the asymptotic form (|B 2[) for the Hessian and g(s) = s q 
(—3 < q < 3), we plot function of starting position x and exponent q in 

Fig [7] Note that q > puts more weight into the tails of [w(x, s; a) — w(x, s; a*)] 2 , 
accentuating differences in higher moments of the FPTD. If q < 0, more weight is 
given to small values of s. For each value of q, there is an optimal starting position 
x* that minimizes the condition number and greatly improves the efficiency of re- 
constructing the bond potential. Although in §4 we discussed changing the starting 
positions to optimize the reconstruction, in cases where the starting position cannot 
be controlled, another strategy may be to estimate an optimal q = q* from figure [7| 
and minimize IT(a) using the weighting function g(s) — s q . We also experimented 
with weighting functions of the form g(s) — e~ ps for p > 0: see figure[7fc) and[7Jd). 
This class of weighting functions seems to give poorer conditioning compared to 
g(s) = s q since k _1 is generally smaller. 



(6) Conditioning with multiple data sets 

In figure [51 we assumed g{s) = 1 and plot the inverse condition number k" 1 — 
Amin/Amax as a function of the Peclet number A*. The potential used was propor- 
tional to the one in figure E^a): $*(af) = A* -(±±ZyJ)o: + ^Ax 2 - Vix* . The 
starting position was x = 0.433. The thin solid curve in figure [S] is taken from fig- 
ure Edb) and corresponds to k _1 when only one FPTD data set (M = 1) is used. 
We compare these values to those when two FPTDs (M = 2) are used as data in 
Iljw(a). The second data set was generated using each of the protocols discussed 
in Section 4(b). The dotted, dashed-dotted and dashed curves were generated from 
the Hessian of Hm(bl) by using both A* and A*/5, two starting positions x\ = 0.433 
and X2 — 0.65, and both the original potential &*(x) and $*(x) — 5x 2 /2, respec- 
tively. In each case we see a different improvement in the conditioning at each value 
of A*. 

Since the largest eigenvalue A max does not change much with increasing M, the 
improvement in conditioning is due mostly to increasing the smallest eigenvalue 
Amin (not shown). Reducing the Peclet number A* only improves the conditioning 
for large A*. Increasing the contraction factor in the two values of ^4* (e.g. from 
A* — > A* /5 to A* —> A*/10) further improves the conditioning by increasing k _1 at 



Article submitted to Royal Society 



18 



P.-W. Fok and T. Chou 



10 



-2 




10 



0.01 



0.1 

Amplitude A 



1 



10 



* 



Figure 8. The inverse condition number function of the potential's magnitude 

A* for one and two data sets. The thin solid curve corresponds to one FPTD (M = 1), 
while broken curves correspond two FPTDs (Af = 2) obtained under different conditions. 
The dotted curve represents k _1 when A* and A* /5 are used in the multi-distribution 
objective function IIif(a) (see equation (J472J) ) . The dot-dashed curve corresponds to 
when xi = 0.433 and X2 = 0.65. Finally, the long dashed curve represents the inverse 
condition number when two target drifts U* (x) and U* (x) + 5x axe used. The weighting 
functions g m (s) = 1 in all computations. 



ever smaller values of A* . In this example, changing the starting position is perhaps 
the most reliable way of facilitating the reconstruction: the condition numbers k 
are decreased by at least an order of magnitude over a wide range of A* and the 
improvement becomes even better for larger A* . Finally adding a probe force greatly 
enhances the reconstruction for moderate A* — 0(1) and there is now an optimal 
A* (and hence system temperature) where the reconstruction is easiest. This is in 
contrast to the M = 1 case, here we always find a monotonically increasing kT 1 as 
A* decreases. 



Appendix C. Reconstruction of many-parameter potentials 

Using additional FPT data, we demonstrate the feasibility of reconstructing com- 
plex potentials that are described by many parameters. 

In figure [5] we successfully reconstruct 7-parameter and 8-parameter potential 
wells containing multiple minima using g(s) — 1 and three (M — 3) FPTDs. For 
both (a) the 7-parameter and (b) the 8-parameter potential, we see that any combi- 
nation of two first passage time distribution data sets fails to accurately reproduce 
the original $*(izi). However, using the three denned data sets (M = 3), we were 
able converge to the correct $*(x) in fewer than 20 iterations. Importantly, we 
were able to quantitatively resolve the multiple minima. It should be noted, how- 
ever, that with the current optimization algorithm we were only able to obtain 
about 2 significant digits of accuracy on the coefficients for these potentials. 
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Figure 9. Reconstruction of seven and eight parameter potentials with unit am- 
plitude A* = 1 and drift coefficients (a) a* = (1/2, 1, 1, 3/2, 1, 3/2, l)/VTi, 
and (b) (1/2, 5, 2, 5/2, 1, 5/2, 2, 15/2)/ v^M- In (a) the three different protocols, 
(xi = 0.07, A[/i = 0), (x 2 = 0.404, AU 2 = 0) and (x 3 = 0.404, AU 3 = x), were 
used to generate three data sets (in each case, A* = 1), denoted by FPTDi, FPTD2, 
and FPTD3, respectively. In (b) the three different protocols (xi — 0.06, AUi = 0), 
(x2 = 0.323, AU2 ~ 0), and (x 3 = 0.323, AU 3 = 5a;) were used to generate three data sets 
also denoted by FPTDi, FPTD 2 , and FPTD3 respectively. 
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